Protein Complex Inference enhanced by Text Mining

نویسندگان

  • Lee Yu
  • Ling Joanne
  • Upali Kohomban
چکیده

Protein complexes play a vital role in living organisms as they regulate and execute biological processes. As experimental methods of extracting protein complexes are fraught with difficulties, scientists look towards protein complex prediction. However, protein-protein interaction (PPI) data which are used to predict protein complexes are often noisy and incomplete. As published literature may hold a wealth of PPI data which goes unnoticed, this paper aims to enhance the prediction of protein complexes by text mining PPI data from literature abstracts. In this paper, we explore various rule-based methods of extracting PPI data from MEDLINE abstracts. Additionally, we show that the removal of non-hub proteins can reduce the impact of noisy PPI data on protein complex prediction and retrieve smaller and more accurate protein complexes which would otherwise be discarded by CMC. Moreover, we show that the selection of abstracts for augmentation is worth doing to overcome the incompleteness of PPI data. iii Acknowledgements I would like to express my deepest gratitude to my supervisor Professor Wong Limsoon for his patience and guidance during the course of this project. I would also like to thank Dr Liu Guimei for providing me with the source codes for CMC and evaluation using GO, Dr Rajesh Chowdhary for providing me the source codes for BN and his guidance in using the program, Su Zhan for his advice in the downloading of MEDLINE abstracts and Dr Upali Kohomban for his help in tagging the abstracts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as proteinprotein interaction from the massive literature. The system recognizes biological term...

متن کامل

Towards Application of Text Mining for Enhanced Power Network Data Analytics - Part Ii: Offline Analysis of Textual Data

Text mining is a subdivision of data mining technologies used to extract useful information from unstructured textual data. In recent years, power distribution networks have become more complex due to the versatile consumer demand and integration of distributed energy resources. This has led to the need for enhanced data processing and analysis, i.e., data analytics, in distribution system stud...

متن کامل

Improving the extraction of complex regulatory events from scientific text by using ontology-based inference

BACKGROUND The extraction of complex events from biomedical text is a challenging task and requires in-depth semantic analysis. Previous approaches associate lexical and syntactic resources with ontologies for the semantic analysis, but fall short in testing the benefits from the use of domain knowledge. RESULTS We developed a system that deduces implicit events from explicitly expressed even...

متن کامل

Semantic reclassification of the UMLS concepts

UNLABELLED Accurate semantic classification is valuable for text mining and knowledge-based tasks that perform inference based on semantic classes. To benefit applications using the semantic classification of the Unified Medical Language System (UMLS) concepts, we automatically reclassified the concepts based on their lexical and contextual features. The new classification is useful for auditin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010